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Abstract 

O ' This paper investigates the design and application of write-once memory (WOM) codes for 

flash memory storage. Using ideas from Merkx [TJ, we present a construction of WOM codes 
based on finite Euclidean geometries over F2. This construction yields WOM codes with new 
parameters and provides insight into the criterion that incidence structures should satisfy to give 
rise to good codes. We also analyze methods of adapting binary WOM codes for use on multilevel 
flash cells. In particular, we give two strategies based on different rewrite objectives. A brief 
discussion of the average- write performance of these strategies, as well as concatenation methods 
for WOM codes is also provided. 
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1 Introduction 

Non-volatile flash memories are becoming increasingly popular due to their potential for high through- 
puts and low power consumption. Flash memory storage is a technology that is based on organizing 
the memory into blocks of cells in which each cell can be charged up to one of q levels. While 
increasing the charge of a cell is easy, decreasing the charge is costly since the entire block contain- 
ing the cell must be erased and rewritten. Such an operation involves reprogramming roughly 10 5 
cells. Moreover, frequent block erasures also reduce the lifetime of the flash device. It is therefore 
desirable to be able to write as many times as possible before having to erase a block [21 [3J HI [5] . 
Like any storage device, the flash cells are also prone to errors due to charge leakage or the writing 
process. Thus, the coding design goals for flash memories include maximizing the number of writes 
between block erasures, correcting cell charge leakage errors, and correcting errors that occur during 
the writing process. 

An information theoretic approach to writing on memories with defects was first considered by 
Kuznetsov and Tsybakov [6], and later surveyed in [7]. The write-once memory (WOM) model, 
introduced by Rivest and Shamir [2j, and other constrained memory models (WUM, WIM, WEM) 
can be considered as particular cases of the general defective channel 0[8l[9]. Due to the asymmetric 
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costs associated with increasing and decreasing cell levels, the flash memory model can be viewed 
as a generalization of the WOM model. As a result, WOM codes have been proposed for flash cells 
having two levels (i.e., capable of storing one bit of information per cell) |14 [ ll2 t fT7]. Error-correcting 
codes for the general defective channel and for WOM have also been considered, although addressing 
errors while incorporating rewriting capabilities is difficult, and many codes in the literature are 
optimized primarily for one of these goals [5j [Tj)J [UJ HI El E] ■ 

This paper is organized as follows. The rest of this section provides notation and background on 
codes for flash memories. In Section 2, after summarizing Merkx's construction of WOM codes from 
finite projective geometries, we present a new construction of WOM codes using finite Euclidean 
geometries. In Section 3 we explore methods of adapting binary WOM codes for multilevel flash 
cells, and introduce two strategies that achieve this with respect to different goals. We also examine 
the average write analysis of these strategies for two specific WOM codes. Finally, we summarize 
ways to combine WOM codes with classical error correcting codes using concatenation in Section 4. 
We conclude the paper in Section 5 with some future directions. 

1.1 Preliminaries 

We now give some definitions and notation that will be used in this paper. A write-once memory 
(WOM) is a storage device over a binary alphabet where a zero can be increased to a one, but a 
one cannot be changed back to a zero. An information message is encoded and stored in a string of 
cells in the memory, referred to as a cell state vectoi^. The cells in the cell state vector form the 
codeword and can be updated, or rewritten, to represent a different message. Only the most recently 
written message is retained. 

A write-once memory code is composed of a set V of information words, called variable vectors, 
and a set S of cell state vectors with S C FJ? , corresponding to the codewords of the WOM code. 
Many different cell state vectors can represent the same information message. In addition, the WOM 
code is equipped with an encoding and decoding function. The encoding function takes as inputs 
both the current state of the memory and the new information message to be stored. Specifically, it 
maps the current cell state vector to an updated cell state vector that represents the new information 
message and is component-wise greater than or equal to the previous state. The decoding function 
maps the resulting cell state vector to the updated information message. The amount of information 
messages that can be encoded at each time step need not be the same, however, as the following 
notation conveys. 

Definition 1.1 Let (vi,... ,vt)/n denote a t-write WOM code on n cells, where Vi is the number 
of messages that can be represented on the i th write. In the fixed information case, i.e., when 
vi = ■ ■ ■ = vt, such a WOM code will be denoted by (v) /n. 

The rate of a WOM code is 

= log 2 (^i ■■■vt) 
n 

When q = 2, the flash cell is called a single level cell (SLC) since the cell can only represent one 
nonzero value, and a multilevel cell (MLC) when q > 2 as it can store values {0, 1, 2, . . . , q — 1}. 
Note that an SLC can store one bit of information per cell whereas an MLC can store multiple bits 
of information per cell. Fiat and Shamir considered a generalized version of a WOM, in which the 
storage cells have more than two states with transitions given by a directed acyclic graph [3j. The 
idea of extending to multilevel cells was further explored by Jiang in [13], in which he considered 

1 This terminology was introduced in Q3] in reference to the structure of flash memory, but it is convenient to use 
in the WOM case as well. 
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Information 


I s * write 


2 nd write 


00 


000 


111 


01 
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Oil 


10 


010 


101 


11 


001 


110 



Table 1: (4) 2 /3 WOM-code by Rivest and Shamir. 



generalizing error-correcting WOM codes. Techniques for rewriting codes on g-ary cells include 
floating codes, which were introduced by Jiang, Bohossian, and Bruck [15J, and more generally, 
trajectory codes, which are described in [16!. Although these are similar objects, we will use the 
term flash codes, introduced in [17], to refer to a rewriting code on multilevel cells. 

Definition 1.2 When q > 2, (i>)*/n will denote a t-write flash code for use on cells having q levels, 
where the code has block length n and v messages can be represented at each write. The capacity 
of a flash memory is the maximum number of writes possible for n number of cells, v number of 
information messages to be represented in each write, and q number of levels per cell. 

Fu and Han Vinck [18] showed that the maximum total number of information bits that can be 
stored per cell over t writes is at most 

log 2 (l + (q - l)i). 

The next example, from [2], gives the canonical example of a WOM code. 

Example 1.3 The Rivest and Shamir WOM code is shown in Table XTl\ {2}. It maps two information 
bits to three coded bits and is capable of tolerating two writes. Note that any of the four messages may 
be written at either write. The table is interpreted as follows: on the first write, the encoding function 
takes the current all-zero state and the new information message and maps it to the representation 
of that message in the 'first write ' column. On the second write, the encoding function takes the 
current cell state and the new information message and outputs the cell state vector opposite the new 
message in the 'second write' column. For example, the message sequence 01 — > 11 would be recorded 
as 100 — > 110. // the new information message is the same as the information represented by the 
current cell state vector, the memory remains unchanged. Decoding is as follows: the cell state vector 
(01,02,03) can be decoded as ((02 + 03) (mod 2) , (ai + 03) (mod 2)). 

□ 

2 Finite Geometry WOM-codes 

In this section, we apply ideas from [TJ to design WOM codes based on finite Euclidean geometries. 
We first provide some relevant definitions. 

Definition 2.1 The m-dimensional Euclidean geometry over F 2 , denoted by EG(m,2), is an inci- 
dence structure with 2 m points and 2( m-1 )(2 m — 1) lines. The points in EG(m,2) may be regarded 
as all m-tuples over ¥2, and each pair of points defines a line. 
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(0010000) (0110000) (0110101) (1110101) 

Figure 1: Four writes using the Merkx PG(2,2) WOM code. 



Note that the set of points in EG(m,2) forms an m-dimensional vector space over F2. A /z-flat 
in EG(m, 2) is a /i-dimensional subspace of the finite geometry, defined next. We will use the term 
hyperplane to refer to a subspace of dimension m — 1 in either PG(m, 2) or EG(m, 2). 

Definition 2.2 Let X be the set of points in EG(m,2). A /i-flat in EG(m,2) passing through a 
point ao consists of points of the form ao + j3\a\ + • • • + fi^a^, where ao, . . . , a^ € X are linearly 
independent and f3\ , . . . , € F2 . 

The number of //-flats in EG(m, 2) is 

f* n(rn—i+l) i 
2 (m- M ) ~ ' 



i=l 



2(m-»+i) - 1 ' 



Moreover, each /i-flat in EG(m, 2) is a coset of an EG(n, 2), and thus contains 2^ points. 

Definition 2.3 The finite projective geometry of dimension m over F2, denoted PG(m,2), is an 
incidence structure with 2 m+1 — l points and — — ■ — — — — lines. The points are the nonzero (m+1)- 
tuples (aQ,a\, . . . ,a m ) € F™ +1 , and a line through two distinct points ao and a\ contains exactly the 
set of points {ao, a±, ao + ai}. 



For more details, see [19] and [20] . 

Merkx constructed a family of WOM codes based on the m-dimensional finite projective geome- 
tries over F2 pQ. The construction exploits a connection between the binary Hamming codes and 
PG(m, 2) that allows the WOM codes to be decoded via syndrome decoding. Specifically, the mini- 
mum weight codewords of the [2 m+1 — 1, 2 m+1 — m, 3] Hamming code C generate C and correspond 
to the incidence vectors of lines in PG(m, 2). In Merkx's construction, the messages correspond to 
points in the geometry. The WOM codewords, i.e. the cell state vectors, are a subset of F™ +1 \ C, 
and thus, since the Hamming code is perfect, these codewords are always one error from a binary 
Hamming codeword. The location of the error indicates the point in the geometry that corresponds 
to the information message. 

Example 2.4 The PG(2,2) WOM code of fj}/ is a (7) /7 code. Each position of a codeword cor- 
responds to a point of the Fano Plane, and each codeword is the incidence vector of a substructure 
of the geometry that highlights a particular point being represented. Codewords are incidences of the 
following: on the first write, a point on the Fano Plane; on the second write, a line missing a point; 
on the third write, a line with a point off of it; on the final write, either the union of two lines or 
the plane missing a point. Thus to decode the WOM code, Merkx observed that syndrome decoding 
identifies the information message. Figure 1 shows the write sequence 3 —> 5 —> 7 — > 3 using the 
(7) 4 /7 code from the Fano Plane. The arrow indicates the information point and the corresponding 
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(10000000) (11010000) (11011001) (11111101) 

(a) Write 1. (b) Write 2. (c) Write 3. (d) Write 4. 

Figure 2: The message sequence 1— s>3— s>2— s-7in the EG(3, 2) WOM code. 

cell state vector representing that information is listed below each write. Note that the sequence of 
cell state vectors is monotonically increasing in each component as the writes progress. 

□ 

The following proposition, by Cohen, Godlewski, and Merkx in |21j . formulates more precisely 
the parameters of the WOM codes that result from this construction method. 

Proposition 2.5 The number of writes that can be attained with a length 2 m — 1-WOM code, storing 
m bits on each write, is 2 m ~ 2 + 2. 

2.1 WOM codes from EG(m, 2) 

We now extend Merkx's idea and design WOM codes from EG(m,2). Since Hamming codes are 
punctured Reed-Muller codes, and are given by geometric designs over the binary field, a construction 
similar to the method above can be applied to EG(m, 2). Minimum weight codewords also generate 
the r th order Reed-Muller code 1Z(r, m), of length 2 m , and correspond to [m— r)-flats in the Euclidean 
geometry EG(m,m — r). Analogous to the Merkx construction, we will use the connection between 
minimum weight words in 1Z{m — 2, m) and the planes in EG(m, 2) to construct our WOM code. 
The codewords are designed to be Hamming distance one away from a codeword of 1Z{m — 2, m), and 
thus are incidence vectors of configurations of points in the Euclidean geometry. Such substructures 
include a point, a plane with a point missing, and a plane with a point off of it. These WOM codes 
may be decoded using any Reed-Muller decoding technique. 

The next two examples illustrate this construction for m = 3 and m = 4. 

Example 2.6 Using EG(3,2), the resulting code is an (8,8, 8,4}/8 WOM code. In other words, the 
code attains four writes on eight cells, where eight possible messages can be stored in the first three 
writes, and four messages can be stored in the fourth write. Recall that EG(3, 2) has eight points, 28 
lines, and 56 planes. Each message corresponds to one of the points in the geometry. On the first 
write, a message i € {1, . . . , 8} is represented by a weight one cell state vector, where the one is in 
the i th coordinate. On the second write, a weight three cell state vector indicates a plane with a point 
missing, where the missing point is the information message. On the third write, the ones in the cell 
state vector correspond to a plane with a point off of it, where the point off the plane is the message. 
Observe that on each of the first three writes, it is possible to represent any of the eight messages. 
Finally, on the fourth write, only messages corresponding to positions of the cell state vector with 
entry zero can be represented ( except for the message represented in the third write, which can always 
remain on the fourth write, if needed). If i is one of these messages, then to represent i on the fourth 
write, the cell state vector will have a one in every coordinate except position i. 

As an example, the message sequence 1 — > 3 — > 2 —> 7 is demonstrated in Figured 

□ 
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Figure 3: EG(4, 2), with four parallel planes shaded, as in |20| . 



In constructing the WOM code from EG(3,2), it is not possible to represent more than four 
messages on the fourth write. Indeed, after the third write, the cell state vector contains five ones 
and three zeros, so at most log 2 (3) information bits can be conveyed by the remaining zero- valued 
positions. The message that is stored in the third write can always be represented on the fouth 
write, simply by leaving the memory state unchanged. Thus, one of at most four messages can be 
represented on the fourth write. 

Example 2.7 Using EG(4:,2), the resulting WOM code has parameters 

(16,16,16,12,8,8,8,4)/ 16. 

Recall that EG(A,2), shown in Figure 3, has 16 points and 140 planes, and can be partitioned into 
two parallel 3-flats. The first four writes are the same as in Example \2.6\ by using the EG{3,2) code 
on a 3-flat that contains the points corresponding to the first four information messages. After the 
fourth write, the points in that 3-flat are all programmed to one, and the EG(3, 2) WOM code may 
be applied to the points of the remaining 3-flat to encode the final four writes. 

□ 

Proposition 2.8 The EG(m, 2) WOM code achieves 4(m — 2) writes and has parameters 

( 2 m 2 m ; 2 m _ ^ yn-l ^ yn-\ ? gm-l > yn-l _ 4j , , , ; 8 , 8, 8, 4) /2 m . 

V v ' 

4(m-2) 

Proof: The cell state vector has length 2 m , equal to the number of points in EG(m,2). Recall 
that each cell state vector in the EG(m, 2) WOM code will be Hamming distance one away from a 
codeword of the Reed-Muller code lZ(m — 2, to). We proceed by induction on the dimension of the 
finite geometry. The base case is the EG(3, 2) WOM code. Now suppose that there exists an EG{k, 2) 
WOM code with the parameters described in Example [2j Consider the finite Euclidean geometry 
EG(k + 1,2). Note that EG{k + 1,2) can be partitioned into two parallel hyperplanes, i.e. two 
disjoint copies of EG(k, 2). Since any four points lie on a common hyperplane (in fact, many), there 
exists a hyperplane that contains the points that correspond to the first four information messages 
to be written. These messages can be encoded using the EG(3, 2) WOM code on a cube within this 
hyperplane containing those points. After the first four writes, all points in the hyperplane are set 
to one, and the EG(k,2) code can be used on the remaining hyperplane. Thus, this EG(k + 1,2) 
WOM code allows for 4((A; + 1) — 2) writes, and has the parameters listed above, with to, = k + 1. □ 

Since codewords of the WOM code are Hamming distance one from a codeword of the corre- 
sponding Reed-Muller code, performing syndrome decoding on a stored cell state vector will provide 
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Code 



length 



rate 



PG(2,2) 
EG(3,2) 
PG(3,2) 
EG(4,2) 
PG(4,2) 
EG(5,2) 



7 
8 
15 
16 
31 
32 



1.60 
1.38 
1.82 
1.66 
1.60 
1.50 



Table 2: Comparison of rates of small dimension projective and Euclidean geometry WOM codes. 



the location of the position of the "error". The code is designed so that this position corresponds 
to an information message, i.e., a point in the geometry. Thus, syndrome decoding identifies the 
message, and can be used to decode the EG(m, 2) WOM code. 

2.2 Comparison 

Table [2] shows the rates of the proposed EG(m, 2) WOM codes and the PG(m, 2) WOM codes 
from pQ for small values of m. As expected from the geometric structure, the efficiency of the EG 
WOM codes is less than that of the PG codes. Indeed, when m = 2 and 3, the PG(m, 2) WOM 
codes have been shown to be optimal [21]. However, the construction presented here yields a new 
family of WOM codes that have simple encoding and decoding algorithms, and shows that variable 
information WOM codes may also be obtained from incidence structures. 

In general, designing efficient WOM codes from incidence structures requires low weight incidence 
vectors, and intersections of these structures that can point to specific messages. In the case of 
EG(m,2), the (m — 2) th order Reed-Muller code was chosen so that the corresponding minimum 
weight codewords would be planes and therefore have low weight. Since any two distinct planes 
intersect in or exactly 2 points, taking unions of multiple planes does not uniquely designate 
any one particular point when multiplicity is considered. The authors are interested in using other 
structures that may be exploited in designing WOM codes where multiplicity can be incorporated, 
and are currently working on designing WOM codes from general bipartite graphs using insights 
gained from the the rewriting rules of the geometric constructions. 

3 Using binary WOM codes on multilevel cells 

The development of flash memory cells on q > 2 levels has renewed interest in efficient coding 
strategies for 'generalized' write-once memories, i.e., those with greater than two states per cell. 
Applying binary WOM codes for use on multilevel cells provides a basis for comparison for efficient 
multilevel coding schemes. In this section we examine construction methods for adapting binary 
WOM codes for use on multilevel cells. 

One way to use binary code^l on q- level cells is to read the cells modulo 2. One naive approach 
is to let the set of codewords consist of all cell-state vectors that reduce modulo 2 to a binary 
codeword. A more efficient application of a (u)*/n code to q-level cells is to increase the charge of 
all cells to 1 after the t th write, and then employ the code again. We will refer to this scheme as the 

2 The idea of reducing the cell state vectors modulo 2 was also used in [55] to adapt classical codes for use on 
multilevel cells. 
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00 


000 


111 


111 


222 


01 


100 


Oil 


211 


122 


10 


010 


101 


121 


212 


11 


001 


110 


112 


221 



Table 3: Rivest-Shamir code adapted to q = 3 levels. 



complement scheme, since reduction modulo 2 either reveals a WOM codeword or the complement of 
a codeword. More precisely, in the complement scheme, let x denote the information message, and 
c 1 (x) be a codeword that represents x on the i th write. We reuse the binary WOM code by taking 
c t+l (x) = c l (x) + 1, for i < t, where 1 is the all ones vector. Similarly, after mt writes, the cell values 
are increased to m, and we set c mt+k (x) = c k (x) + m • 1 for k = 1, . . . ,t — 1. Note that this scheme 
guarantees (q — l)i writes. Table [3] shows Example 11.31 adapted to q = 3-level cells in this way. 

We will use this simple scheme as a basis for comparison when considering the following methods 
of adapting binary WOM codes to (/-levels. 

Construction: Consider a (2 k ) t /n WOM code. Let x be a binary information sequence of length k, 
and let U (x) = {u : u = c l {x) for some i = 1, . . . , t}. Let s be a length n cell state vector representing 
the message x. Given s, suppose we want to write a new message y ^ x. Let V be the set of n-tuples 
with all entries even (possibly 0) and less than q. We present two strategies. 

• Strategy A: To minimize the number of cells that are increased, search the set U(y) + V for 
the representation whose difference from s requires the fewest cells to increase. Thus, look for 
s' € U(y) + V such that s' > s (componentwise, all entries in s' are at least as much as those 
in s) and further that s and s' differ in the least number of places, i.e. the Hamming weight, 
wt}{(s' — s) is minimized. The new cell state vector is s' and represents the new message y. In 
searching the set U(y) + V as the cell values approach q, we omit the values of s' that would 
cause a block erasure. 

• Strategy B: To minimize the magnitude of the resulting cell state vector s', search the set 
U(y) + V for the representation whose difference from s is such that the maximum cell entry 
of s' is minimized. If there is a tie, arbitrarily choose one that requires the fewest number of 
cells to increase. Thus, look for s' € U{y) + V such that s' > s and that the maximum entry 
in s' is the smallest. 

For specific codes, the strategies can be described more explicitly. For example, the following flash 
code encoding map is based on Example 11.31 and uses reduction modulo 2 to identify the decoding 
map from the cell state vectors to the variable vectors. Following Strategy A, the rewriting rule is 
as follows. Let s be the current cell state vector representing the message x, and y the new message 
to be written. 

• Ux,y GF|\{00}, 

— If s mod 2 = c 1 (x), add the weight one vector w = c 2 (y) — c l (x) to the current state, to 
obtain the new cell state vector s' = s + w. 

— If s mod 2= c 2 (x) write w = c 1 (z), where z GF^ {00, x, y}, to obtain s' = s + w. 
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• If x = 00, write c 1 (y). 

• If y = 00, then if s mod 2 = c 1 (x), add c 1 (x) to s; otherwise add 1 — c 2 (x) to s. 

Following Strategy B, the rewriting rule depends on the actual magnitude (in {0, . . . , q — 1}) of 
each cell entry. 

The general rule is to increase a subset of the cells such that the new vector reduces to either 
c 1 (y) or c 2 (y) modulo 2 and no one cell is allowed to gain too much charge. 

Example 3.1 Using the rules above for the Rivest- Shamir WOM code in Example I LSI suppose the 
following information sequence is to be stored in a given set of cells with q = 4 levels. 

11 -> 00 -»■ 01 -> 10 -> 11 -> 01 

Following Strategy A, the sequence of cell state vectors is as follows 

A : 001 — s- 002 -> 102 -> 103 -> 203 -> 213 
Following Strategy B, the sequence of cell state vectors is as follows 

B : 001 -»■ 111 -> 211 -> 212 -> 312 -> 322 

□ 

Example 3.2 To further illustrate the different strategies, consider writing the sequence 1 — > 2 — >■ 
1 — )• 3 using the PG(2,2) WOM code in Example \2.4\ where the labeling on the Fano Plane is as in 
Figure 1. Following Strategies A and B, the sequence of cell state vectors is as follows: 

A : (1000000) ->• (1001000) -> (1002000) -> (1002001) 
5 : (1000000) -> (1001000) -> (1001101) -> (1101111) 

□ 

3.1 Analysis of Strategies A and B 

The expected number of writes for floating codes was studied in [23\ [24"] and can be more important 
than the worst case analysis in determining which codes to use in practice. Code constructions in 
|15| have a guarantee of (q — 1) + L^y^J writes for a k = 2-dimensional message space and n = 2 
cells. The same paper also proved the existence of floating codes that achieve (q — l)n — o(n) writes 
as n — > oo for fixed k and q. Asymptotically optimal codes for the average case with k = 2 have been 
constructed where the expected number of writes grows like n(q — 1) — o{q) |24j . Both cases include 
the assumption that only one cell level changes at each write, which is reasonable when n S> 2 k . 
However, since Strategies A and B are intended to be used for any WOM code, not just those that 
meet this criterion, we do not use this assumption. 

The guaranteed number of writes using Strategy B for the (4) 2 /3 Rivest-Shamir WOM code 
on q level cells is 2(q — 1). This can be seen by examining a sequence of messages that cause 
a maximum number of cell increases under Strategy B. For example, the alternating sequence of 
messages 00 — > 01 — > 00 -> 01 -> 00 -> • • • -> 01 -> 00 has cell state vector sequence 000 100 -> 
111 -> 211 -> 222 -> >• (g - l)(g - 2)(g - 2) -»■ (g - l)(g - l)(g - 1). Observe that for every two 
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writes, the cell state vector does not increase a cell level more than once, and both representations 
of a given message are used. Thus, the guaranteed number of writes using Strategy B is 2(q — 1). 

The guaranteed number of writes using Strategy A for the (4) 2 /3 Rivest-Shamir WOM code on 
q level cells is also 2(q — 1). Again we consider a sequence of messages that causes the maximum 
number of cell increases. For example, the alternating sequence of messages 00 — > 01 — > 00 — > 01 — > 
00 ->• • • • -4 01 -4 00 -4 01 has cell state vector sequence 000 ->• 100 -4 200 -4 300 ->■ 400 -> • • • -4 

(g - 2)00 -4 (g - 1)00 -4 (g - 1)11 -)• (g - 1)22 -4 >• (g - l)(g - l)(g - 1). Observe that the first 

q — 1 writes follow the Strategy A protocol to increase the fewest number of cells, but that once any 
cell attains the maximum charge, the Strategy continues to write using the next best representation 
choice for each message. Thus, a total of 2(g — 1) writes are guaranteed. 

The following theorem shows that the guaranteed number of writes for both Strategies A and B 
is at least as good as the complement scheme for any general binary WOM code. 

Theorem 3.3 Let C be a (v) /n binary WOM code. Then, the guaranteed number of writes by 
applying either Strategy A or Strategy B to C on q-level flash cells is at least (q — l)t. 

Proof: We prove by induction on q. For q = 2, the WOM code already guarantees t writes. So 
assume the hypothesis holds for q = r. That is, for any sequence of messages, we are guaranteed 
at least (r — l)t writes using Strategy A or B. Now let us consider the case when q = r + 1. Then 
for any sequence of (r — l)t messages, using Strategy A or Strategy B, by the induction hypothesis 
we will reach a cell state vector (ci, C2, . . . , c n ), with entries q < r — 1, % = 1, 2, . . . , n. We can now 
artificially increase each cell levels to r — 1 at the end of (r — l)t writes to yield a cell state vector 
(r — 1, r — 1, . . . , r — 1). Without loss of generality, the cell state vector (r — 1, r — 1, ■ ■ ■ , r — 1) can 
be thought of as being the all-zero vector (0, 0, • • • ,0). It is now easy to see that either Strategy A 
or Strategy B will allow us to write at least t more times using the original t writes of the binary 
WOM code C. Thus, a total of rt writes is guaranteed for either Strategy when q = r + 1, thereby 
proving the result. □ 

To see if the lower bound of (g — l)t writes is met in Theorem 13.31 the weight distributions of the 
different representations for each message in the original WOM code have to be taken into account. 
For example, for two write WOM codes where the minimal weight representation for each message 
is unique, the guaranteed number of writes is 2(g — 1) as above. The authors are currently looking 
at how to classify when a WOM code meets this lower bound using the weight distributions of the 
message representations. 

Strategies A and B applied to the Rivest-Shamir code each guarantee two writes when q = 2 
and four writes when q = 3, whereas the expected number of writes using the Strategies for this 
code (assuming a uniform distribution on the message space) is approximately 2.47 for q = 2 and 
4.89 for q = 3 for each case. Note that the simple application of the Rivest-Shamir code to g-level 
cells using the complement scheme requires q > 3 to get more than two guaranteed writes. Figure [4] 
compares the average number of writes of the complement scheme, Strategy A, and Strategy B on 
g-level cells when applied to the binary Rivest-Shamir WOM code from Example 11.31 In Monte Carlo 
simulations, 10 5 random message sequences were generated and the number of writes was recorded 
for the three different methods. As shown in Figure HJ the strategies applied to the Rivest-Shamir 
code exhibit a noticeable gain over the the complement scheme that is growing as q — > 00. However, 
the average number of writes for each Strategy is still quite far from the capacity limit on the number 
of writes possible for representing four messages per write using three cells on g-levels (see Section 

1). 
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Strategies A and B did not exhibit much gain over the complement scheme when the PG(2, 2) 
code in Example 12.41 was simulated for small q. We believe that this is due to the near-optimality of 
the PG(2, 2) WOM code. Further, we believe that in general, the more optimal a code is, the less 
it will benefit from the strategies, since the reapplication of the code under the complement scheme 
already generates an efficient code. 



Average number at writes using the W[3,2,2] Bivest-Shamir code on q-level cells 
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Figure 4: Comparison of the average number of writes achieved by Strategies A and B and the 
complement scheme. 

In |24j . two coding schemes are presented that have a similar flavor to Strategies A and B, but 
apply in the different setting of random floating codes. In that work, the authors propose two 
random coding schemes: a "Simple scheme" that randomly chooses to increase a single cell by one, 
and a "Least scheme" that chooses a message representation that increases the coordinate with the 
lowest charge level. In contrast, Strategies A and B in this paper apply to any WOM code without 
the assumption that only one cell increases at each write. We also expect that the performance 
of codes under these strategies will differ more for certain classes of codes and when non-uniform 
distributions on the message space are considered. Further analysis of the performances of the 
strategies for different WOM codes is underway, including quantifying their average performance 
using both uniform and nonuniform distributions on the message space. 



4 Concatenated error-correcting flash codes 

In this section we consider ways that code concatenation may be used to obtain new WOM or flash 
codes. Let [re, k, d] q denote a classical q-ary linear code of block length re, dimension k, and minimum 
distance d. Two classical codes may be concatenated as follows. 

Definition 4.1 Let A be an [rei, k%, di] q k 2 code and B be an [n^, &2, d-^q code. Then the concatenated 
code C = AMB is an [reiri2, kik2,d\d2]q code with outer code A and inner code B. The k\ information 
symbols (each chosen from a q k2 -ary alphabet) are first encoded into n\ symbols using A. Each of 
the encoded symbols is then represented by ki q-ary symbols. Each group of these &2 symbols is then 
encoded into n2 q-ary symbols using B. Thus, n\U2 encoded symbols are obtained to form a codeword 
in C. 
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The above concatenation may be seen by the following mapping 

¥ % -A > F ™4 g-ary representation ¥ n q lk2 g > F™ 1 " 2 

Concatenating classical codes with binary WOM or flash codes yields codes with both error 
correction and rewrite capabilities. 

Several researchers have observed that an outer (2 k ) t /n WOM code A when concatenated with 
an inner [m, 1)2 repetition code B yields a (2 k ) t /nm binary WOM code C = A K B, where C can 
correct L^^^J errors |13l 1251 [14] . We expand on these ideas to obtain codes for multilevel flash cells. 

A code Cw 13 Cr, where Cw is a WOM code and Cr is a length-m repetition code, can be 
employed as an error-correcting code on g-level cells with the following strategy: on the first write, 
the binary codeword is written on the cells. An error can be detected by majority decision among 
each set of m consecutive positions. For subsequent writes and error correction, we will read the g-ary 
vector as a binary codeword from Cw, by reducing the values in the cells modulo 2. In particular, 
if a one was erroneously written on the first write in a cell that should have contained a zero, we 
correct the error by increasing the level of the cell to 2, which is viewed as a (modulo 2). The error 
has been corrected in the binary word that is read, and the code can correct [ m ^" 1 j errors on each 
write. Subsequent writes are achieved by increasing chosen cell levels to obtain the desired parity, 
modulo 2. 

The following theorem uses this method to obtain an error-correcting WOM code. Note that 
errors can occur in either direction and are assumed to be of magnitude one. 

Theorem 4.2 Let Cw be a (2 k ) t /n WOM code and let Cr be the [m, l,m]2 repetition code. The 
code Cw ^ Cr is an -error- correcting WOM-code on SLCs. Moreover, applied to 

q-level cells and using the reduced binary vector representation, Cw ^ Cr is a (2 k ) t jran flash code, 
where t' = [" ;}-] and [ m ^ 1 j errors can be corrected at each write. 

Proof: For q = 2 the resulting code is a (2 k ) t /mn [^^- J -error correcting WOM code. For any 
q, the length mn-code has dimension k. We show that the worst-case number of rewrites is [ ^ 1 ■ 
Note that Cw ^ Cr is still a binary code, but we use it on the g-ary cells by reading the information 
stored in the cells via the reduced binary vectors. Up to L^^^J errors can be detected and corrected 
at each write. Note that in this scheme, error correction consists of increasing the charge level of the 
cell by one to correct the parity in that entry of the reduced binary vector. In the worst case, an 
error occurs in the same position on every write, and so that position sees an increase of three levels 
at each write. However, in the absence of errors we could achieve (q — l)t writes due to the rewriting 
capability of Cw and the reapplication of the WOM code on q-level cells. Thus, the worst-case 
number of writes in the error case is [ 3 ] • D 

As an example of the reading process, if q = 4, n = 1, m = 3, the sequence (332) in a cell-state 
vector would be read as (110) in Cw^Cr, and decoded to (111) using majority rule. As an example 
of the error-correction process, consider a cell that is meant to be increased to (modulo 2); if an 
error causes the cell to instead be read as 1 (modulo 2), then to correct it the charge is increased 
again. Thus that cell has seen a total increase of three levels on that write cycle. A similar idea of 
increasing the cell levels to correct for errors has also been considered in |14^ I10j. 

Example 4.3 Let Cw be the (4) 2 /3 WOM code defined in Example HOI and let Cr be the [3, 1,3]2 
repetition code. Then the code Cw ^ Cr is a (4) 2 /9 single error- correcting WOM code on SLCs 

(first observed in [13]). Moreover, on q-level cells, the code Cw ^ Cr is a (4) 9 3 /9 single error- 
correcting flash code. □ 



12 



Example 4.4 Let Cw be the (7) 4 /7 code based on PG(2,2) from fl^ and let Cr be the [3, 1,3]2 
binary repetition code. Then the code Cw ^ Cr is a (7) 4 /21 single error- correcting WOM code on 

SLCs. Moreover, on q-level cells, the code Cw ^ Cr is a (7)g 3 /21 single error- correcting flash 
code. □ 

We next show how to obtain a flash code with increased error-correction by concatenating an 
inner flash code with an outer classical code. 

Theorem 4.5 Let C\ be an [n\,ki] q k 2 code that corrects e errors, and C2 a (2 k2 ) t q /n2 E-error- 
correcting WOM code. ThenC\MC2 is a (2 klk2 ) q /(niri2) WOM code capable of correcting (E + l)(e + 1) 
errors. 

Proof: The length and dimension of C\ M C2 is immediate. Note that this code achieves t writes 
since the inner flash code is capable of t writes. The minimum number of errors that must occur for 
a decoding failure is (E + l)(e + 1), where E + 1 errors occur among each of e + 1 distinct length-/^ 
g-ary expansions of symbols in C\. Any smaller number of errors can be corrected by the length 
n\ri2 concatenated code. □ 

For comparison, we show the concatenation of a inner binary repetition code with a classical 
binary outer code for use on g-level flash cells. 

Theorem 4.6 Let C be an [n, k,d]2 e- error- correcting code and let Cr be the [2m + 1, 1,2m + 1]2 
binary repetition code. Then the code C M Cr for q-level cells results in a (2 fc )*/((2m + l)n) flash 
code that corrects (me + m + e) errors and guarantees t = \^f-~\ writes. 

Proof: The length and dimension follow from the construction. Concatenating two binary codes 
results in a binary code, but we use reduction modulo 2 to adapt the code to (/-ary cells. Errors that 
result in a change in parity of a cell can be corrected by increasing the level of the cell by one. In 
the worst case, an error occurs in the same cell at every write. In order to correct it, the cell level is 
increased by one so that it has the same parity as the entry before the error occurred. Thus this code 
guarantees f 2 ^-] writes. Note that the outer code can correct up to e errors and the inner code can 
correct up to m errors. Thus, the concatenated code can tolerate (m, + l)(e + 1) — 1 = me + m + e 
errors. □ 

Observe that this use of a classical code on multilevel cells gives better error-correction capabilities 
than the code in Theorem 14. 21 but can tolerate fewer rewrites since the only rewrite capabilites come 
from the number of levels. 

Example 4.7 Let C be an [n, k, d}2 e- error- correcting code and let Cr be the [3, 1, 3] binary repetition 
code. Then the code CMCr for q-level cells yields a (2 fc )*/(3n) flash code that corrects 2e + 1 errors 
and gets t = writes. □ 



5 Conclusions 

We showed how the structure of finite Euclidean geometries can be used to obtain new variable 
information WOM codes. We also introduced several strategies for adapting WOM codes to multilevel 
cells that allow for a greater rewrite capability than classical codes adapted for multilevel coding 
schemes. Combined with concatenation, these codes also have the ability to correct multiple errors. 
We are currently investigating the use of other incidence structures, including finite geometries over 
W q , for developing new coding schemes for multilevel flash memories. 
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